RAID and NASD
NASD comments from reading:
- bad explanation of architecture
- file manager not well explained
RAID comments from reading
- Failure assumptions may be limited (e.g. no correlated failures – what about batch failures?)
- Caching not considered
RAID background
Problem: technology trends
- computers getting larger, need more disk bandwidth
- disk bandwidth not riding mooreÕs law
- faster CPU enables more computation to support storage
- data intensive applications
-
Approaches:
- SLED: single large expensive disk
- RAID: redundant array of (independent, inexpensive) disks
NOTE:
- Disk arrays had been done before
- Contribution of this paper is a taxonomy and a way to compare them and organize them
Key ideas:
- striping: write blocks of a file to multiple disks, can read/write in parallel
- Redundancy: write extra data to extra disks for failure recovery. E.g. parity, ecc, duplicate data. Redundancy can improve performance – have choice of disk (latency), 2 disks (throughput)
Why arrays?
- Cheaper disks
- Lower power
- Smaller enclosures
- Higher reliability
o Can survive a disk failure
- Larger bandwidth
o Can read or write multiple disks at a time
How do you compare disk setups?
- Price?
- Power?
- Size?
- Performance?
o What performance?
o Large reads
o Small reads
o Large writes
o Small writes
o Read / modify / write (TP)
Organization:
- take N disks, put into groups of G
RAID versions:
JBOD: just a bunch of disks, mount as separate volumes
- Read / write performance for a file limited to single disk
- Reliability for a byte is same as single disk, but file system can tolerate some disk failures with partial data loss
RAID 0: striping
- Striping data across disks
- Best overall performance: G reads/sec, G writes/sec
- Worst reliability: MTTF = MTTF(disk) / G
RAID 1: mirroring
- store all data on two disks
- write to both disks
- read from whichever disk is faster (better positioned)
- Write performance = single disk
- Read performance = double
- Overhead is 100%
RAID 2: bit-wise ECC
- stripe data across disks in small units
- Store ECC bitwise on a parity disk
- All reads / writes hit all disks
- Can detect / correct lots of errors
- Bad performance
- FILL ME PERF
RAID 3: bit parity
- rely on disk for error detection
- Still read from all disks (but parity), write to all disks
-
RAID 4: block parity
- use single disk for error correction, rely on controllers for detection
- Can read from a single disk (no need to compute ecc)
- can write to two disks (data disk + update parity)
- Bottleneck: single parity disk for all writes
- Small writes require 4 accesses: read only block, old parity, write new block+ new parity
RAID 5: distributed parity
- same as level 4 but parity disk changes for each block
- Removes hotspot of parity disk
- Large writes efficient – just one extra access for parity
RAID 6: more error correction
- 2 parity disks allows detection 2 disk failures
-
Throughput per dollar
|
small read |
small write |
large read |
large write |
storage efficiency |
Reason |
raid 0 |
1 |
1 |
1 |
1 |
1 |
|
raid 1 |
1 |
½ |
1 |
½ |
½ |
extra disk |
raid 3 |
1/G |
1/G |
(G-1)/G |
(G-1)/G |
(G-1)/G |
one disk doesnÕt contribute |
Raid 5 |
1 |
max(1/G, ¼) |
1 |
(G-1)/G |
(G-1)/G |
|
|
|
|
|
|
|
|
Notes: Raid 2 inferior – like raid 3 but more ECC drives. Raid 4 inferior to Raid 5 – similar best case, but throughput limited by single parity disk
Choices of RAID
- QUESTION: what should you choose, when?
- Issues:
o Cost of disks – is it relevant? Perhaps space/power more relevant
o Workload: lots of small reads/writes indicates raid 1, lots of large reads and writes indicates 5
NASD
Technology trends:
- need distributed file system
- file server is bottleneck between client and data
- QUESTION: how do you scale up a file system?
o A: partition
¤ Still limited by disk ˆ server bandwidth
¤ Partitioning usually limited to certain areas, e.g. volumes, mount points
¤
Approaches:
- SAN: storage area networks
o attach disks to network
o Block level interface (read block, write block)
o Cooperating file systems to make it work
o Offers block-level management: backup, shadow, RAID
- NAS: network attached storage
o Richer interface to data: e.g. file systems, objects
o Inherits SAN benefits if implemented on SAN
- NASD: network attached disks
PROBLEM STATEMENT:
- bandwidth to clients limited by need for a centralized file manager
o QUESTION: Why?
o FS semantics, consistency, naming
- File system requires unnecessary copies
o Off disk to network
o network to memory
o Off memory to network
o network to client memory
ENABLING TECHNOLOGY:
- I/O bound applications: multimedia, databases, data mining
- New drive interfaces: they can be put on the net with iScsi
- Smarter drivers – more opportunities for programming them
- Storage networks / computer networks convering
- Storage servers (e.g. nfs, afs) not cost effective: server cost is dominant cost unless many disks attached
NASD Idea:
- separate metadata & management from data transfer
- Provide security mechanism to allow disk right onto network, without interposed control
- Principles:
o Data transferred directly from disk to client, no through server
o Asynchronous oversight: client can perform operations w/o synchronous access to manager. E.g. can read / write data without contacting manager. Policy info provided by manager as a capability, enforced by disk
o Object based interface: not blocks or files, but variable-length objects. File manager can use them as whole files or stripes. Provides more semantics for disk – more information available
- client talks to file manager to open files, creates directories, etc.
- File manager returns a capability that allows client to access disk directly
NASD interface:
- functions to access objects
- Secured with capabilities (like Kerberos tickets)
o Encrypted with disk key
o Contains private session key
o Client must prove it knows the session key with an authenticator
o May contain policy for disk to enforce
o Contains byte range for access (e.g. can limit to part of the object)
USING NASD
NFS:
- files == objects
- Lookup done on server, return capabilities
- Attributes map onto object attributes or uninterpreted by disk but interpreted by client NFS library
AFS:
- files == objects
- Clients parse directories, must ask file manager for a capability to a file
- Consistency model (invalidate callbacks on write) changes because writes not reported to manager; manager instead invalidates on open-for-write
- Quotas handled by granting access to more data than current size (update after close)
NASD PFS
- parallel file system by striping data across disks
- New storage layer, Cheops, implements striping (RAID 0) but same object interface
o Translates access for an object into many more capabilities that client can access
o Stripes data in 512kb chunks